{24 () Parallel Formulations of Decision-tree Classiication Algorithms

نویسندگان

  • Anurag Srivastava
  • Eui-Hong Han
  • Vipin Kumar
  • Vineet Singh
چکیده

Classiication decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud detection, etc. Highly parallel algorithms for constructing classiication decision trees are desirable for dealing with large data sets in reasonable amount of time. Algorithms for building classiication decision trees have a natural concurrency, but are diicult to parallelize due to the inherent dynamic nature of the computation. In this paper, we present parallel formulations of classiication decision tree learning algorithm based on induction. We describe two basic parallel formulations. One is based on Synchronous Tree Construction Approach and the other is based on Partitioned Tree Construction Approach. We discuss the advantages and disadvantages of using these methods and propose a hybrid method that employs the good features of these methods. We also provide the analysis of the cost of computation and communication of the proposed hybrid method. Moreover, experimental results on an IBM SP-2 demonstrate excellent speedups and scalability.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Eecient Parallel Classiication Using Dimensional Aggregates

Multidimensional aggregates are frequently computed to improve query performance in Online Analytical Processing applications. We present a new method for decision tree based classiication trees using the aggregates computed in the multidimensional data model. The structure imposed on data in a explicit multidimensional storage mechanism leads to eecient dimensional operations. Decision tree ba...

متن کامل

rid Age Car Type Risk 0 23 family High 1 17 sports High

Classiication is an important data mining problem. Although classiication is a well-studied problem, most of the current classi-cation algorithms require that all or a portion of the the entire dataset remain permanently in memory. This limits their suitability for mining over large databases. We present a new decision-tree-based classiication algorithm , called SPRINT that removes all of the m...

متن کامل

Dynamic Load Balancing of Unstructured Computations in Decision Tree Classifiers

One of the important problems in data mining is discovering classification models from datasets. Application domains include retail target marketing, fraud detection, and design of telecommunication service plans. Highly parallel algorithms for constructing classification decision trees are desirable for dealing with large data sets. Algorithms for building classification decision trees have a ...

متن کامل

Comparing Connectionist and Symbolic Learning Methods

Experimental comparison of back-propagation and decision tree methods have provided many data points but less understanding of why one method works better for some tasks than for others. This paper observes that, just as there are sequential and parallel classiication methods , there are certain classiication tasks that lend themselves to methods of one or the other type.

متن کامل

Early Prediction of Gestational Diabetes Using ‎Decision Tree and Artificial Neural Network Algorithms

Introduction: Gestational diabetes is associated with many short-term and long-term complications in mothers and newborns; hence, the detection of its risk factors can contribute to the timely diagnosis and prevention of relevant complications. The present study aimed to design and compare Gestational diabetes mellitus (GDM) prediction models using artificial intelligence algorithms. Materials ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998